skip to main content


Search for: All records

Creators/Authors contains: "Lipka, Alexander E."

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Genomic regions containing loci with effect sizes that interact with environmental factors are desirable targets for selection because of increasingly unpredictable growing seasons. Although selecting upon such gene‐by‐environment (G × E) loci is vital, identifying significantly associated loci is challenging due to the multiple testing correction. Consequently, G × E loci of small‐ to moderate effect sizes may never be identified via traditional genome‐wide association studies (GWAS). Variance GWAS (vGWAS) have been previously shown to identify G × E loci. Combined with its inherent reduction in the severity of multiple testing, we hypothesized that vGWAS could be successfully used to identify genomic regions likely to contain G × E effects. We used publicly available genotypic and phenotypic data in maize (Zea maysL.) to test the ability of two vGWAS approaches to identify G × E loci controlling two flowering traits. We observed high inflation of from both approaches. This suggests that these two vGWAS approaches are not suitable to the task of identifying G × E loci. We advocate that similar future applications of vGWAS use more sophisticated models that can adequately control the inflation of . Otherwise, the application of vGWAS to search for G × E effects that are critical for combating the effects of climate change will not reach its full potential.

     
    more » « less
  2. Abstract Background

    Given the economic and environmental importance of allopolyploids and other species with highly duplicated genomes, there is a need for methods to distinguish paralogs, i.e. duplicate sequences within a genome, from Mendelian loci, i.e. single copy sequences that pair at meiosis. The ratio of observed to expected heterozygosity is an effective tool for filtering loci but requires genotyping to be performed first at a high computational cost, whereas counting the number of sequence tags detected per genotype is computationally quick but very ineffective in inbred or polyploid populations. Therefore, new methods are needed for filtering paralogs.

    Results

    We introduce a novel statistic,Hind/HE, that uses the probability that two reads sampled from a genotype will belong to different alleles, instead of observed heterozygosity. The expected value ofHind/HEis the same across all loci in a dataset, regardless of read depth or allele frequency. In contrast to methods based on observed heterozygosity, it can be estimated and used for filtering loci prior to genotype calling. In addition to filtering paralogs, it can be used to filter loci with null alleles or high overdispersion, and identify individuals with unexpected ploidy and hybrid status. We demonstrate that the statistic is useful at read depths as low as five to 10, well below the depth needed for accurate genotype calling in polyploid and outcrossing species.

    Conclusions

    Our methodology for estimatingHind/HEacross loci and individuals, as well as determining reasonable thresholds for filtering loci, is implemented in polyRAD v1.6, available athttps://github.com/lvclark/polyRAD. In large sequencing datasets, we anticipate that the ability to filter markers and identify problematic individuals prior to genotype calling will save researchers considerable computational time.

     
    more » « less
  3. null (Ed.)
  4. null (Ed.)
    Abstract Background Advances in genotyping and phenotyping techniques have enabled the acquisition of a great amount of data. Consequently, there is an interest in multivariate statistical analyses that identify genomic regions likely to contain causal mutations affecting multiple traits (i.e., pleiotropy). As the demand for multivariate analyses increases, it is imperative that optimal tools are available to assess their performance. To facilitate the testing and validation of these multivariate approaches, we developed simplePHENOTYPES, an R/CRAN package that simulates pleiotropy, partial pleiotropy, and spurious pleiotropy in a wide range of genetic architectures, including additive, dominance and epistatic models. Results We illustrate simplePHENOTYPES’ ability to simulate thousands of phenotypes in less than one minute. We then provide two vignettes illustrating how to simulate sets of correlated traits in simplePHENOTYPES. Finally, we demonstrate the use of results from simplePHENOTYPES in a standard GWAS software, as well as the equivalence of simulated phenotypes from simplePHENOTYPES and other packages with similar capabilities. Conclusions simplePHENOTYPES is a R/CRAN package that makes it possible to simulate multiple traits controlled by loci with varying degrees of pleiotropy. Its ability to interface with both commonly-used marker data formats and downstream quantitative genetics software and packages should facilitate a rigorous assessment of both existing and emerging statistical GWAS and GS approaches. simplePHENOTYPES is also available at https://github.com/samuelbfernandes/simplePHENOTYPES . 
    more » « less
  5. null (Ed.)
    Quantification of the simultaneous contributions of loci to multiple traits, a phenomenon called pleiotropy, is facilitated by the increased availability of high-throughput genotypic and phenotypic data. To understand the prevalence and nature of pleiotropy, the ability of multivariate and univariate genome-wide association study (GWAS) models to distinguish between pleiotropic and non-pleiotropic loci in linkage disequilibrium (LD) first needs to be evaluated. Therefore, we used publicly available maize and soybean genotypic data to simulate multiple pairs of traits that were either (i) controlled by quantitative trait nucleotides (QTNs) on separate chromosomes, (ii) controlled by QTNs in various degrees of LD with each other, or (iii) controlled by a single pleiotropic QTN. We showed that multivariate GWAS could not distinguish between QTNs in LD and a single pleiotropic QTN. In contrast, a unique QTN detection rate pattern was observed for univariate GWAS whenever the simulated QTNs were in high LD or pleiotropic. Collectively, these results suggest that multivariate and univariate GWAS should both be used to infer whether or not causal mutations underlying peak GWAS associations are pleiotropic. Therefore, we recommend that future studies use a combination of multivariate and univariate GWAS models, as both models could be useful for identifying and narrowing down candidate loci with potential pleiotropic effects for downstream biological experiments. 
    more » « less
  6. Abstract

    The ability to accurately quantify the simultaneous effect of multiple genomic loci on multiple traits is now possible due to current and emerging high‐throughput genotyping and phenotyping technologies. To date, most efforts to quantify these genotype‐to‐phenotype relationships have focused on either multi‐trait models that test a single marker at a time or multi‐locus models that quantify associations with a single trait. Therefore, the purpose of this study was to compare the performance of a multi‐trait, multi‐locus stepwise (MSTEP) model selection procedure we developed to (a) a commonly used multi‐trait single‐locus model and (b) a univariate multi‐locus model. We used real marker data in maize (Zea maysL.) and soybean (Glycine maxL.) to simulate multiple traits controlled by various combinations of pleiotropic and nonpleiotropic quantitative trait nucleotides (QTNs). In general, we found that both multi‐trait models outperformed the univariate multi‐locus model, especially when analyzing a trait of low heritability. For traits controlled by either a combination of pleiotropic and nonpleiotropic QTNs or a large number of QTNs (i.e., 50), our MSTEP model often outperformed at least one of the two alternative models. When applied to the analysis of two tocochromanol‐related traits in maize grain, MSTEP identified the same peak‐associated marker that has been reported in a previous study. We therefore conclude that MSTEP is a useful addition to the suite of statistical models that are commonly used to gain insight into the genetic architecture of agronomically important traits.

     
    more » « less
  7. Abstract Maize inflorescence is a complex phenotype that involves the physical and developmental interplay of multiple traits. Given the evidence that genes could pleiotropically contribute to several of these traits, we used publicly available maize data to assess the ability of multivariate genome-wide association study (GWAS) approaches to identify pleiotropic quantitative trait loci (pQTL). Our analysis of 23 publicly available inflorescence and leaf-related traits in a diversity panel of n = 281 maize lines genotyped with 376,336 markers revealed that the two multivariate GWAS approaches we tested were capable of identifying pQTL in genomic regions coinciding with similar associations found in previous studies. We then conducted a parallel simulation study on the same individuals, where it was shown that multivariate GWAS approaches yielded a higher true-positive quantitative trait nucleotide (QTN) detection rate than comparable univariate approaches for all evaluated simulation settings except for when the correlated simulated traits had a heritability of 0.9. We therefore conclude that the implementation of state-of-the-art multivariate GWAS approaches is a useful tool for dissecting pleiotropy and their more widespread implementation could facilitate the discovery of genes and other biological mechanisms underlying maize inflorescence. 
    more » « less
  8. null (Ed.)
    Plant growth, development, and nutritional quality depends upon amino acid homeostasis, especially in seeds. However, our understanding of the underlying genetics influencing amino acid content and composition remains limited, with only a few candidate genes and quantitative trait loci identified to date. Improved knowledge of the genetics and biological processes that determine amino acid levels will enable researchers to use this information for plant breeding and biological discovery. Toward this goal, we used genomic prediction to identify biological processes that are associated with, and therefore potentially influence, free amino acid (FAA) composition in seeds of the model plant Arabidopsis thaliana . Markers were split into categories based on metabolic pathway annotations and fit using a genomic partitioning model to evaluate the influence of each pathway on heritability explained, model fit, and predictive ability. Selected pathways included processes known to influence FAA composition, albeit to an unknown degree, and spanned four categories: amino acid, core, specialized, and protein metabolism. Using this approach, we identified associations for pathways containing known variants for FAA traits, in addition to finding new trait-pathway associations. Markers related to amino acid metabolism, which are directly involved in FAA regulation, improved predictive ability for branched chain amino acids and histidine. The use of genomic partitioning also revealed patterns across biochemical families, in which serine-derived FAAs were associated with protein related annotations and aromatic FAAs were associated with specialized metabolic pathways. Taken together, these findings provide evidence that genomic partitioning is a viable strategy to uncover the relative contributions of biological processes to FAA traits in seeds, offering a promising framework to guide hypothesis testing and narrow the search space for candidate genes. 
    more » « less
  9. Luigi Martelli, Pier (Ed.)
    Abstract Motivation Advanced publicly available sequencing data from large populations have enabled informative genome-wide association studies (GWAS) that associate SNPs with phenotypic traits of interest. Many publicly available tools able to perform GWAS have been developed in response to increased demand. However, these tools lack a comprehensive pipeline that includes both pre-GWAS analysis, such as outlier removal, data transformation and calculation of Best Linear Unbiased Predictions or Best Linear Unbiased Estimates. In addition, post-GWAS analysis, such as haploblock analysis and candidate gene identification, is lacking. Results Here, we present Holistic Analysis with Pre- and Post-Integration (HAPPI) GWAS, an open-source GWAS tool able to perform pre-GWAS, GWAS and post-GWAS analysis in an automated pipeline using the command-line interface. Availability and implementation HAPPI GWAS is written in R for any Unix-like operating systems and is available on GitHub (https://github.com/Angelovici-Lab/HAPPI.GWAS.git). Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less